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STIMULUS. SAMPLING THEORY FOR A CONTINUUM OF resbonsesl/ 


Patrick Suppes 


1. Introduction. 

The aim of the present investigation is to extend stimulus sampling 
theory to situations involving a continuum of possible responses. The 
theory for a finite number of responses stems from the basic paper Estes [2]; 
the present formulation will resemble most closely that given for the finite 
case in Suppes and Atkinson [4]. In a previous study (Suppes [3]) I was 
concerned with a corresponding extension of linear learning models, and 
several results of that study are, as we shall see, closely related to the 
present one. 

The experimental situation consists of a sequence of trials. On 
each trial the subject (of the experiment) makes a response from a continuum 
of possible responses; his response is followed by a reinforcing event in- 
dicating the correct response for that trial. In situations of simple learning, 
which are characterized by a constant stimulating situation, responses and 
reinforcements constitute the only observable data, but stimulus sampling 
theory postulates a considerably more complicated process which involves the 
conditioning and sampling of stimuli. In the finite case the usual assumption 
is that on any trial each stimulus is conditioned to exactly one response. 
Such a highly discontinuous assumption seems inappropriate for a continuum 
of responses, and I have replaced it with the postulate that the condi- 


tioning of each stimulus is smeared over a certain interval of responses, 
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possibly the whole continuum. In these terms, the conditioning of any 
stimulus may be represented uniquely by a smearing distribution. These 
distributions, one for each stimulus, will play the same role as did the 
single smearing distribution introduced in my earlier paper on linear 
models [3]. 

The theoretically assumed sequence of events on any trial may 


then be described as follows: 


trial begins with certain response reinforcement possible 
each stimulus in — stimuli ~ occurs 7 occurs > change in 

a certain state of are conditioning 
conditioning sampled occurs. 


The sequence of events just described is, in broad terms, postulated to 
be the same for finite and infinite sets of possible responses. 
‘Differences of detail will become clear. The main point of the axioms 
in the next section is to state specific hypotheses about this sequence 
of events. As has already been more or less indicated, three kinds of 
axioms are needed, namely, those concerning conditioning, those concern- 
ing sampling, and those concerning responses. 

The third section contains some general theorems of the theory. 
The fourth Saeilon considers in some detail the classical case of non- 
contingent reinforcement. The fifth section treats of other cases more 
superficially. 

Although no experimental data will be described in this paper, 


it will perhaps help in intuitively understanding the theory to describe 


schematically one piece of apparatus which has been used to test the theory 
extensively. The subject is seated facing a.large circular vertical disc. 

He is told that his task. on each trial is to predict by means of a pointer 
where a spot of light will appear on the rim of the disc. The subject's 
pointer predictions are his responses in the sense of the theory. At the 
end of each trial the "correct" position of the spot is shown to the subject, 
which is the reinforcing event for that trial. The most important variable 
controlled by the experimenter is choice of a particular probability distri- 


pution of reinforcement. 


2. Axioms. 

The axioms are formulated verbally but with some effort to convey 4 
sense of formal precision. It is not difficult, although not wholly routine , 
to convert them into a mathematically exact form. .As already indicated, they 
fall naturally into three groups. In the statement of the axioms we use x 
for the response variable and z for the parameter of the smearing distri- 
bution K (232) of any stimulus s . Moreover, z is the mode of the 
distribution; for the circular disc apparatus it is also assumed to be the 
mean, but not all apparatus to which the theory applies is so completely 


symmetric. 


CONDITIONING . AXIOMS 
Cl. -For each stimulus s there is on every trial a unique smearing 


distribution K (52) on the interval [a,b] of possible responses such 


that 





he 
(2) the distribution K (x32) is determined by its mode 2 and 


its variance; 


(b the variance is constant over trials for a fixed stimulating 


situation; 





(c) the distribution K, (x32) is continuous and piecewise 


differentiable in both variables. 


C2. If a stimulus is sampled on a trial the mode of its smearing 
distribution becomes, with probability © , the point of the response 
(if any) which is reinforced on that trial; with probability 1-6 the 


mode remains unchanged. 


C3. .If no reinforcement occurs on a trial there is no change in 


the smearing distributions of sampled stimuli. 


ch, Stimuli which are not sampled on a given trial do not change 


their smearing distributions on that trial. 


of events. 


SAMPLING AXIOMS 


Sl. Exactly one stimulus is sampled on each trial. 


Se. Given the set of stimuli available for sampling on a eae 


trial, the probability of soepring. a given element is independent of the 


RESPONSE AXIOMS 


Rl. If the sampled stimulus s and the mode z of its smearing 


Because of the similarity of these axioms to those in Suppes and 
Atkinson [4], I shall here mainly comment on those aspects peculiar. to 
the continuum case. In the finite case the complicated form of Axiom Cl 
reduces simply to the assertion that on any trial each stimulus is 
conditioned to exactly one response. As already remarked, the assumption 
(Cl(a)) that the smearing distribution of any stimulus is determined by 
its mode and variance, rather than its mean and variance, is used in order 
to permit application of the theory to unsymmetrical apparatus. For 
instance, suppose the Srpetimanted. set-up consists of a bar a meter’or so 
in length on which the subject is to set a pointer to predict the occurrence 
of a spot of light. It seems unreasonable that the conditioning effect of . 
a reinforcement near the end points of the bar will be smeared symmetrically 
to the left and to the right. For such a situation the mean of the smearing 
distribution (of a sampled stimulus) may not be at the point of reinforcement 
even though conditioning is effective. On the other hand, it seems psycho- 


logically sound that the mode of the smearing distribution will be at the 
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point of reinforcement--granted the effectiveness of conditioning. In the 
present formulation of the theory it is essential to have the one free 
parameter of the smearing distribution closely tied to the points of 
reinforcement, for when conditioning is effective, which occurs with 
probability @ , this parameter assumes the value of the point of rein- 
forcement (Axiom C2). This corresponds to the assumption in the finite 
response case that with probability © sampled stimuli become conditioned 
or connected to the reinforced response. 

The remaining conditioning axioms (C3, Ch, C5) have almost exactly 
the form which is also appropriate for the finite case. The same is true 
of the two sampling axioms. In contrast, the first response axiom, Rl, 
has a much simpler form in the finite case: with probability one that 
response is made to which the sampled stimulus is conditioned. Axiom RL 
generalizes this assumption in the obvious manner in terms of the smearing 
distribution of the sampled stimulus. 

The three axioms C5, S2 and R2 are what have been termed in. the 
literature independence of path assumptions. Only R2 is new here; the 
other two are also needed in the finite Meee: These three axioms are 
crucial in the proof that for simple reinforcement schedules the sequence 
of random variables which take as values the modes of the smearing 
distributions of the stimuli constitute a continuous state Markov Saale 

For mathematical analysis in the remainder of this paper it will be 
useful to introduce some notation. In particular, we need notation for 


five random variables, their values and their distributions, as well as a 


notation for their joint distribution. Three of these random variables 
take values in the interval [a,b], the continuum of possible responses 


and reinforcements fixed throughout the paper. Thus we have for trial n: 


(i) the response random variable XxX with values x, oF 


simply x, distribution Rg and density 3 


(ii) the reinforcement random variable xX with values VY, OF Ys 


distribution FE, and density f3 


(iii) the smearing parameter random variable a of stimulus s 


with values 2z or z_, distribution G and density g . As 
Syn s syn sn 


indicated already Ze is the mode of the smearing distribution of 
stimulus s. The random variable a> without the subscript s, shall 


take as values finite vectors z= (z 


peeeyZ.) relative to the ordering 
sy s 


(Bos tae5 8 of the set S of stimuli. 
1? N 


.We also need for occasional use: 


(iv) the sampling random variable 5. with values s, or s 
for the sampled stimulus, and discrete density a3 (it is always assumed 


that the set S of stimuli is finite.) 


(v) the effectiveness of conditioning random variable Dn with 
value 1 for effective and O for non-effective, and probability 9 of 


value 1, following Axiom C2. I use 8 ti for values of dD « ‘Thus 
2 


always 5, = lor O. 
ijn 2 
I use Jn for the joint distribution of any finite sequence of 


these random variables.the last of which occurs on trial n, and Jn for 
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the corresponding density. For occasional reference to points in the 
underlying sample space, € is used. Finally, the notation K(x 32) 
for the smearing distribution of stimulus s was introduced earlier. 
In terms of the five random variables introduced the postulated 
sequence of events on any trial, which was described informally before, 


may be symbolized: 

Z08 ~X -~Y -dD 2 ° 

— — =n — —n =nt+1 
Note that the value of the random variable Zz represents the conditioning 
of each stimulus at the beginning of trial n, for in the present continu- 
ous theory conditioning is in terms of a one-parameter family of smearing 
distributions, 

It will also be useful to give a more precise formulation of the 


response axioms, Rl and R2, in terms of the notation just introduced. It 


is intended that R1 simply asserts: 


a, 
2 
Play SX, S$ agl8, = 9,2, 9 = 2) = I d,(xls,z)ax = K(ay3z)-K,(a, 32) . 
1 
Axiom R2 states an independence of path assumption. Let w be any 


n-1 


sequence of outcomes of the random variables defined up to trial n-1. 


Then. R2 asserts: 


or) 35 
J Jy (218, 225 2¥ yg OX = J Jnl 892, Jax = K,(a532)-K, (8,32) . 
sak ai 





Tndication of some obvious relations for the response density a will 


also be helpful later. First, we have that 


r(x) =.3,(x) , 
l.e., rn is just the marginal density obtained from the joint distribution 


J, + Second, we have "expansions" like 


b 
r(x) = I Jy(%2, 82, 4 ’ 


b b b 
1 (*) = J d I dyh%22 5 Vy 8s Fn hoa ’ 


3. General Theorems. 

This section contains a few general theorems which mostly correspond 
to ones which have proved useful in experimental work with the finite case. 
It is assumed that the reinforcement distribution Ey which is selected by 
the experimenter, is always continuous and piecewise differentiable in all 
variables. Under these assumptions and those of Axiom Cl.on the smearing 
distributions, no questions of integrability arise. Proofs of the first 


theorems are rather explicit in order to indicate the role of the axioms. 


Theorem 1. (General Response Theorem) 


b 
(1) r(x) => o.(s) f ki (xsz.)e, (2 ,)az, 
ses a < 
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Proof: Mainly by virtue of Axiom S1, which asserts that exactly 


one stimulus is sampled on each trial, 
b 
(2) a(x) = SII dy(oyss8, ae, 


b 
=> J alxls.2,)i(slz,)i(z,)az, - 

S a 
In view of Axiom Cl and Axiom RL 
(3) ja(xls,2,) = &,(x52,) 3 
from Axiom S2, the independence of path assumption on sampling, 
(4) i,(slz,) = o,(s) 5 
and on the basis of the notation introduced in the last section 
(5) dil2_) = By nl2,) - 
The theorem follows immediately from (2)-(5) . Q.E.D. 

The next. theorem asserts the Markov property which is essential 

for further deductive developments of the theory. It is a straight- 


forward matter to generalize this theorem to more complicated reinforcement 


distributions which depend on the actual responses or reinforcements on 


{ 
| 
| 
| 
| 





Site 


several preceding trials; the generality of the present theorem is 


sufficient for our purposes here. 


Theorem 2, (Markov Theorem). If the reinforcement distribution 
F(y) on trial n is independent of n and depends only on the immediately 


preceding response on trial n, then the sequence of random variables 


< Zy Zo5 seopZ yee. > is a continuous state Markov process. 


Proof: By direct probability considerations for ty geese ty >1, 
(6) Jn (2, lg etait peees By ae ip p pe Jy (2,18, mn-1? n-1? 
1 ioaasé 


*ye1? 8-177 n -17"n-t, yrange n-t, )dyia (Oy wala 1? *n-1? ?n-1? 


Bye nt 8 net Medasas 1 Trai, 91 1 nad nat 2 nat ie 
1 m L m 


Jnaal X11 Sa-1?"n-1?"n-t,? as Fat.) . Jy-1! 5.1! *a-L?*n-t,? noes: 


tn Dag ie 
Now by Axiom Cy if 5, nel =1 then 
J6 "4 | Py nel Maal ned a1 n-) net,’ eae nt) = 
provided the wecnOr Zo = y. in its coordinate for stimulus s , 


n n-1L 
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otherwise it is equal to 0; and if 58, n-1 = then if 
wn 
2, = 2, Jy, «+.) =1, otherwise 0. For any of these cases, 
the value of inf zl oes) is not affected by a . Secondly, 


by virtue of Axiom C5 


Sne2 5 nea Waa ne ned tneL net, 77 2a-t,) =i ina) # 


Thirdly, on the basis of the hypothesis of the theorem 


Jnr! Ta-l | *y-1? “ne1?"n-1?"n-t,? ae 7nt ) = ey | x,-1) 7 


Fourthly, in view of Axioms Rl and R2 


Jnr! Spann tant, 7111 7Fact,) = Sn Spal Sane) - 
Finally in view of Axiom S2 


Jy Snel | *ae1?*n-t,’ were nut) - Oy .16 8.1) 


‘When all these results of applying the "independence of path" assumptions 
are substituted in (6), and the summations and integrations are performed 


on the result, we have that 


i (z |z 


Ja@nl@n nat, 7° nat,) = Jyh 4n net? 2 
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the desired result. Q.E.D. Some readers may feel that the above theorem 
could have been assumed as an axiom, but this is to misunderstand the 
character of the theorem in the context of the general stimulus sampling 
theory formulated by the axioms. The axioms on which this theorem is based 
are of a general nature and are concerned with fundamental aspects of the 
postulated psychological process of learning. In contrast, the theorem is 
relatively restricted, dealing as it does with only a small class of the 
possible schedules of reinforcement. 

We turn now to some recursion theorems for various quantities; of 
particular interest is that for response probabilities. It is possible to 
state and prove these theorems under the general assumption of WN stimuli 
in the set S. However, both computations and notation become rather 
cumbersome, so that at this stage of development of the theory it is.a 


reasonable simplification to impose the following. 


Restrictive Hypothesis: There is exactly one stimulus element in 6. 
Probabilities enter the theory for a continuum of responses in so many 
different ways that it is certainly not now possible to distinguish 
empirically between models with different numbers of stimuli when the 
stimulation is constant. And in the case of discrimination experiments, 
each stimulating situation may be treated as a single stimulus, which 
entails that on aa trial there is exactly one stimulus available for 


sampling, although the set S may contain more than one element. As a 


matter of fact, this restrictive hypothesis of a single stimulus is 
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already a practical necessity for complicated reinforcement situations 





in the finite case (see, for instance, Atkinson and Suppes [1]). 
We begin with a recursion for the distribution & of the smearing 
parameter z of the single stimulus. (On the assumption of a single 


stimulus we drop the subscript s.) 





Theorem 3. 
(7) G4(2) = (1-6)e,(z) + of,,(2) 
Proof: By Axiom Ce if conditioning is effective then Zn = Vy 
and thus. the distribution of z is that of y_, which is ff . On 
ntl n n 
the other hand, if conditioning is not effective, then Za = 2p and 


_thus the distribution of Boy] is simply go. By Axiom C2 the probability 
of the first alternative is 6, and. that of the second 1-9, which yields 
the theorem. .Q.H.D. 


In the familiar notation of the finite case, where A, - is 
? 


response i ontrial n and E, is reinforcing event j on trial n, 
ef 


(7) corresponds to: 


(8) BR) OB) PEE 








.For ‘the response density Tr, we have: 
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‘Theorem 4, 


b 
(9) ray (X) = (1-O)z (x) #6 J k(xsy)e, (yay 
: a 


Proof: We have at once from Theorem 1, 


b 
Ty) =f kOe 


ney (242 ; 
a 
Applying Theorem 3 to the right-hand side, we have: 
b 
Taat*) = 3 ik(x3z)[(1-0)g (2) + @f,(z) ldz 


b b 
(1-0) f k(xsz)e,(2) +0 f k(x52)f,(2)az 
a a 


b 
(1-6)r (z) +0 Sf k(xsy)e,(y)ay , 
a 


i 


where the variable of integration is changed in the second integral on 
the right. Q.H.D. 

Robert R. Bush suggested that it is of interest to see what 
happens when the interval [a,b] is cut into a finite innertoe parts 
and the resulting finite response case is studied. For simplicity, we 
may divide the interval into exactly two parts. Let a<ec<b , and 


call xy n & response on trial n in the interval [a,c], and a 
? 


Xo on. 
response on trial n in [c,b]. Clearly 
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P(X.) = B,(e)-R,(a) = R,(c) 


P(X, 


oyn) = R,(b)-R,(c) = 1-R,(c) . 


And by integrating (9) of Theorem 4, we have at once 
Theorem 5. 


e ob 
POX at) = (1-0) P(X) +8 J lo k(x3y)£,, (y)axdy 
(10) 
b b 
P(X nso) 7 (1-0 )P(X, ) +6 I J K(xsy)£,(y)dxdy 
The recursions for X. and Xp may be regarded as a generalization 
1,n gn 
of (8) for the finite case when a continuous smearing of the effects of 
reinforcement is postulated. By further specialization, it is possible 
to get an exact analogue of (8). Let us suppose that there are only two 
points of reinforcement, one the midpoint y, of the interval [a,c], and 
the other the midpoint Yp of the interval [c,b]. Suppose moreover that 
the smearing densities around these two. points of reinforcement are 


strictly positive only in the subinterval [a,c] or [c,b] as the case may be. 


Define then 


K 
i] 


c 
J &(xsy, )ax 
a 


Yon el Klxsyp)dx , 
c 





S17 
and under these suppositions (10) becomes: 


P(X, nel 


); 


_ (1-6) P(X, ) + or(Ys 
an exact analogue of (8). (Naturally weaker: suppositions will also 

yield such an analogue, but the present example is illustrative of one 
method for obtaining the finite case from the continuous one.) 

The suppositions just made to yield (8) may also be used to 
yield the standard theory of the finite case at a deeper level, for 
(8) is only a recursion in the mean probabilities of responses and in 
itself does not justify derivation of any sequential statistics like the 
probability of two successive AY responses. However, these matters 
will not be pursued further here. 

In connection with this comparison of models it may also be 
remarked that the response density recursion (9) of Theorem 4 is exactly 
the same as that obtained in [3] for the continuous response linear model. 
Consequently, the results in [3] for various kinds of contingent reinforce- 
ment (and a fortiori noncontingent reinforcement) follow at once in the 


present theory. 


4 Nonecontingent Reinforcement. 

For noncontingent reinforcement schedules, that is, those for which 
the distribution F(y) is independent of n and the past, we first use 
the response density recursion (9) to prove some simple useful results which 


do not explicitly involve the smearing distribution of the single stimulus 
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element and which also hold in the linear model but were not stated in [3]. 
‘There is, however, one necessary preliminary concerning derivation .of the 


asymptotic response distribution in the stimulus sampling theory. 


Theorem 6. In the noncontingent case 


b 
(11) r(x) = lim) ri(x)= J k(xsy)f(y)ay . 
n> @ a 


Proof: Because in the noncontingent case fy) = f(y) , we have at 


once from Theorem 3 


(12) e(z) = lim (2) = f(z).. 


noo 


-The theorem immediately follows from (12) and Theorem 1. Q.E.D. 

.We now use (11) to establish the following recursions. In the 
statement of the theorem &x,) is the expectation of the response 
random variable X ; W(X) is ite r@” rey moment ; o°(X,) is its 


variance; and X is the random variable with density r.. 


Theorem 7. 


(13) Taf) = (1-6) (x) + Or(x) , 


(lh) G(X yy) = 9) Gx.) +9 &x) , 
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(15) w(K yy) = (1-9)n(K,) + Ou, 5 


(16) o°(X,,,) = (1-6)0°(X,) + o0°(x) + o(1+0)(E(x,) - &(x))? 

Proof: Because f(y) = f(y) in the noncontingent case, equation 
(13) follows at once from (9) and (11), i.e., from Theorems } and 6. 
Multiplying both sides of (13) by x” and integrating over the interval 
[a,b], we obtain (15), of which (14) is a special case. As for (16), we 
infer it from the following: 


2 


(X41) = Mo(K 47) -E (Kieu) 


(1-6)uy(X,) + Oug(X) --(1~0)* B(x,) - 20(1-0) £(x,) E(x)) 


~.6 G(x)? 


" 


(1-6) [ug(X,) = &(%,)°1 + lug(x) - J H*1 


+ (020°) £(x,)°= 2(0-6°) $(x,) F(x) + (0-67) $(x)* 


(1-8)0"(x ) + @0°(x) + 0(1-6)[ £(X) - dar QED. 


Because (13)-(15) are first-order difference equations with constant 


coefficients we have as an immediate consequence of the theorem: 
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(27) w(x) = r(x) - fe(x) = xy(x)1(1-0)84 
(18) $(x,) = US) - 16 -G(x) a0)", 
(19) bn(Ky) = Hy) = CCH) = wy) a8) 


Although the linear and (one-element) stimulus sampling models 
poth yield (13)-(19), predictions in. the two models are already different 
for one of the simplest sequential statistics, namely, the probability of 
two successive responses in the same or different subintervals. 

For two. subintervals [a,c] and [c,b], we have the following 
theorem for the stimulus sampling model. ‘The result generalizes directly 


to any finite number of subintervals. 


Theorem 8. For noncontingent reinforcement 


(20) lim P(a<X 


<c,a< X <c)= @R(c)> + 
Tes = 445 
n> 


c b ‘ 
(1-6) f 6 Jf k(xjz)k(x'3z)f(z)dxdx'dz ; 
aaa 


<0, ¢<% <B) = 6 Rlc)(1-R(c)) + 


(21) lim Pla< xX 
= St =a = 


n 3.0 + 


c b b 
(2-6). f of f w(xj2)k(x'32)£(z)dxdx'dz , 
a c¢ a 
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where -R(c) = lim R(c). 
Proof: We first establish (20). To begin with, 
Pla < Xue S03 8S x <e)= I ey dng % p40? 8n) pay ‘ 
Applying the axioms in the usual way to the right-hand side we obtain: 
ec : 
I i Inet 4? Xn) 4h a 


e boecb 
f fo >_/ f f Jj (x 9% 76 20, 2%, 9% ) 
tas aty an mele nel nt)? ntl? “in? n’? nn? nn 


oe n+l dz. 479% 02, 


Ct ae ia i Sega Taig Cagayan): 


aoaii a 


JCB, PCY, Ia, IZ, Ry 9 O21 OY, OH, 82 


ec be b 
Sof fof (eC, sy, ery s(x 32,8, (e,) + 


2a 24 4 


k(x, ,32,)(1-@)k(x 52, a, (z,) Idx, dy dx dz F 


Now lim (2 z) = f(z), whence at asymptote, we have by rearranging the 
n-— 00 


right-hand side and re~lettering variables: 
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5 e b 
lim Pa<X,<c,a<X% <e) =o lf Jf &(xsy)f(y)axayl: 
n.-.c a a 


e b ecb 
[f f &(x's3z)f(z)dx'd2]+(1-e)f f f &(x3z)k(x'3z)f(z)dxdx'dz , 
aa aaa 


put the first term on the right is just eR(c)*, which when substituted in 
yields (20). 

The argument cevuaianine (21) proceeds along exactly the same lines 
with functions of x, now integrated over the interval [c,b]. .Q.E.D. 

For comparative purposes the corresponding results for the linear 
model are derived in the Appendix. 

The theorem just proved may be used to develop a reasonably good 
method of estimating the learning parameter 9@. The sequence of response 


random variables < Ay As eins vA «2. > where 


1 if response on trial n 
is in interval [a,c] 


2 otherwise 


is a chain of infinite order. If it were a first-order Markov chain (20) 
and (21) could be used to obtain a maximum likelihood estimate of @. ‘The 
estimate e* proposed here is formally identical with the latter, but of 
course it is not the maximum likelihood estimate. . For purposes of a label I : 


call it the pseudo-maximum likelihood estimate. 
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Let By Ages e 28, 


represent a finite sequence of values of the 
response random variables Ay oAps e+ AD from trial 1 to trial n. 

Let s be the number of subjects. Then, granted statistical independence 
of the subjects, the maximum likelihood estimate of @ is the number 3 


(if it exists) such that for all @' 


(22) TC el) a 56) = Tle a_;0') 
gel By pBgreees n? i gel prPareees n? td 


where Na 


A 
y28gs e+ +r8 39) is the probability of the sequence of 


A 
responses By Agree 98, for subject o when the learning parameter is @. 
As should be clear from preceding remarks, the pseudo-meximum 
* 
likelihood estimate of ©@ is the number @ such that forall @' 
s 


(23) TE FC #6 (aglag 930" e606") > 


TC TE ale 138" )t ay 30") « 


o=1 m=2 
To simplify notation, let p, 569) be the probability of going from state i 
to state j, for i,j = 1,2 , with parameter 0, let May be the number of 


actual transitions from state i to state j, summed over trials and sub- 


jects (the n are tabulated from experimental data), let p, (6) be the 


ig 
probability of being in state i on trial 1, and let ny be the number of 
subjects in state i on trial 1. We then want to find the © which 


maximizes 





-ohe 
TC», *(a)p, 5(9) 


It is usually easier to work with the log of this expression, so we seek 


to maximize 
(2k) L"(@) = >_ In, log p,(6) + >7 n,, log Pp, ,(9)] - 
L j * 


* * 
In most cases L(@) has @.local maximum, so we can find @ as an 


appropriate solution of 


ae) nypi(@) __n, ew ,(@) 


(25) oe “pty * Sp tay 15° 


where p' is the derivative with respect to @ of p. Now on the 


pasis of (20) and (21), at asymptote 


: = (1-9) Se 7 1. . 
(26) P11 (8) = OR(c) + Rey / f | w(x32)k(x'52)f(2)dxdx'az 


and 


1-6) e bb 
(27) Po, (9) = OR(c) + TR e J of Jf w(x32)k(x'3z)f(z)axdx'dz 
ace 


a 


and p, (@) is independent of 9. Also, of course, P;p(9) = 1-p, (9) 7 


and Pool 8) = 1-p,, (6 ) . Moreover, 








i 
i 
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(28) pt, (6) =R(e) - tO} 
and 
(29) B'p(0).=:R(e) - TRUS , 
where 

, 2 
(30) a= f K(esz)°f(z)dz , 
since 


ze) ecoeb : 
f K(lesz)°e(z)az = f f f k(xsz)k(x'sz)£(z)axaxtaz , 
a a a a 


and — 
e b b 

(31) B=f Sf J &(x3z)k(x'3z)f(z)axdx'az 
a ¢ a 


ob 
=f K(e3z)(1-K(e3z))f(z)dz =:R(e)-a. 
a 


-Applying (26)-+(31) to (25) and using the fact that p, (8) is independent 


of 6, .we obtain: 


=26- 


atte) malo -aPal | laf =H 


(gay, | 
(c) 1-6 )a 1-0 )a 
OR(c) + Gea - 1 - @R(c) - ae 


‘ 15, [R(c) - carey! F Dooly fray - R(e)] 
OR(e) + oa 1 - OR(c) - {e8 


Solving (32), we have 


Theorem 9. If r(x) = r(x) for all x in [a,b], then the 


* 
estimate @ is a solution of the quadratic equation 


2 
ne" + [(Nn,,)A + (nj) + 255)B + (Neng, )Cle 
+ (ny 548 + (nyo +ny, )AC+ n, BC) =0,° 


where 


a/(R(c)* - a) 
B= -(R(c) - a)/(R(e)* - a) 


ar(e))/(R(e)* =a) « 


> 
I 


C=(l+a 


Moreover, if, R(c) 5 » then 


it 
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Note that the hypothesis of the theorem simply requires that we start 
counting trials at asymptote. The statistical properties of the estimator 
e" need investigation; it can be shown that it is consistent. 

I conclude the treatment of noncontingent reinforcement with two 
expressions dealing with important sequential properties of stimulus 
sampling models. The first gives the probability of a response in the 
interval [a ,8] given that on the previous trial the reinforcing event 


occurred. in the interval [b, sb, 1]. 


. Theorem 10. 


(33) Pla, < X41 < Alb) < HS by) =-(1-6)IR,(a,) - R(a,)] 
: Bp Pe 
* FUE,)-Fb, 5 a k(xsy)f(y)axdy « 
aL Pe 
Proof: By the usual expansion 
; ® b bo » 
Pa SBS els Rs Were! J as S 
2 1 a. a 4. .b a 
1 1 
Jnga 4a??? Os Inn) hn na nn . 
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And the right-hand side is 


r a5 95. 
FT | bre) SSS Ows2 de (2) 8(y)axayaz + 
2 a, b a 
Lo ool 
a5 Pe » 
ef f J &(xsy)t(y)e,()axdydz |. 
a ob, a 
Now.in the first term we can integrate 
by 
Jf t(y)ay = F(b5)-F(b,) 5 
2 1 
ee 


b 
and in the second term f{ g,(2)dz = 1. Using these two results, we obtain 
a 


the theorem at once. Q.E.D. 
The second expression to which we now turn gives the probability of a 
response in the interval fa, a5] given that on the previous trial the 


reinforcing event occurred in the interval [by »b and the response in the 


ol 


interval [a,,2)] ¥ 


Theorem 11. 


(34) Pla, S4nun S a, |b, S¥, 80> gus xs ay) 7 


E ist) zc f f of k(x3z)k(x"52)g, (2 )dxdx'dz + 
mg 3 ay a3 a 
Be. Up 


r) ; 
F(B,)-FIb, ff &(xsy)t(y)dxdy 
Soong 
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Proof: It is first useful to observe that for noncontingent 


reinforcement 


Plby S Za 


IA 
o 


op BS eS ay) 


P(d 


1S ES Polags HS Pag < KS a) 


= P(b, < as by )P(a, < xX, Ss a) 


1 


[F(b)-F(b,)1ER,(9y)-R,(@3)] 


Applying the usual expansion to the left-hand quantity in (3), we have 


itis 
= TEb,)-F(b ER Oe ee >J J f 
Bester rahe aie a a, ait »b, a a 


dz y_,AxX az, 


(x 5. VX 9%, )a nar en? 


j Z x, 
Soa Spa? na? i,n?*n? nn? nn’ ntl 


which, using particularly Axioms C2 and..C5, yields: 


: ( ie i i; r 
= 1-0 
} BUD, ) “#0, IER (ay )-R, 33 & by a, 
(x32 )k(x'32)g (2)£(y)dxdx ‘dydz 
es 
+ Of f ff wlxsy)t(y)k(x'32)e,(2)dxdx'dyas | . 
2, dD a, @ 
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Now in the first term of this last expression we may integrate out .the 
function f(y) to obtain F(b, )-F(b, ), which cancels the corresponding 
quantity in the denominator. Similarly in the second term we may integrate 


out k(x'5z)e, (2) to obtain R,(ay)-R,( ) , which for this term cancels 


a 
3 
the corresponding quantity in the denominator. Putting these results 
together, we have exactly the theorem. Q.E.D. 

It may be noticed that by applying the Corollary of Theorem 7, more 


explicit results are easily obtained from both Theorems 10 and 11. 


5. Simple Discrimination. 

It may be of some interest to sketch how the present theory may be 
applied to simple discrimination situations where on.each trial exactly 
one stimulus Ss, is presented, and associated with each 85 is a reinforce- 
ment distribution et . (Readers who do not like the idea of exactly one 


stimulus being presented may think of each 8, 1S being a particular pattern 


of stimuli.) Let the probability of presentation of 5, .on any trial be 


N 
W, 2 with > w, = Ll, a, #0 for i =.1,...,N , and wu, independent of 
i=l 


trial number and any behavior on preceding trials. 
The tree of the Markov process in the states (24,27) for N.=:2 


and ow, = 1/2 is given in Figure 1. 
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(y,2°) 

ic) 
1-0 (21,27) 
(zy) 

ic} 
eoP seg ae 


Figure 1. 


Corresponding to Theorem 1, we have by the same sort of proof 


for arbitrary WN 


N ‘b * . i 
(35) r(x) = SoS k, (x2 Je, (2° ae 
isl a i i,n 


Corresponding to Theorem 3 we have 
AG a ped 
(36) Bayi(2) = (lea, (z" |S, = 5) + oe%(z") , 


and by virtue of Axion Ch for i144 and 8-8, 
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(37) Bal?) = e (24) 


whence it easily follows that 


(38) lim g,(2") = f(z") . 
n7o 


We then have also that 


2b 
(39) lim Pia, < X < a,lS=s,)=f J &, (xsy)et(y)ay . 
n3@ i 2! 2 a a 84 


. The results (35)-(39) and some other related ones which are easily 
obtained, although simple in character, permit application of the theory 
developed in this paper to simple discrimination experiments with a 
continuum of responses. On the other hand, it is obvious that the 
present theory must be modified and extended in fundamental ways to deal 
with discrimination experiments which have a continuum of stimuli as well 


as responses. 


{1] 


[2] 


~33- 


REFERENCES 


Atkinson, R. and P. Suppes. "An analysis of two-person game 
situations in terms of statistical learning theory," J-Exp. 
Psychol., 55(1958), 369-378. 

Estes, W.K. "Toward a statistical theory of learning," Psychol. 
Rev., 57(1950), 94-107. 


Suppes, P.."A linear learning model for a continuum of responses," 
Chapter 19 in Studies in Mathematical Learning Theory, edited by’ 
R.R. Bush and W.K..Estes. Stanford, California: Stanford 


University Press, 1959. 


Situations, I. The Theory. Technical Report No. 21, Contract 


Nonr 225(17), Applied Mathematics and Statistics Laboratory, . 


Stanford University, 1959. 


-34- 
appenprx®/ 


Our purpose is to derive for the linear model of [3] the analogues 
of (20) and (21). .A brief description of the linear model will make the 
present discussion nearly self-contained. An experiment may be represented 


by a sequence (X,,Y¥, ,X,,Y, poeesk ) of response and reinforcement 


=1?=1?=2?-2 
random variables. The theory is formulated for the probability of a 


ga. gooe 


response on trial n+l given the entire preceding sequence of responses 
and reinforcements. .For this sequence we use the notation Sy (not to be 
confused with the notation for the value of the sampling random variable 
in the main body of the paper). Aside from continuity and piecewise 
differentiability assumptions, the single axiom of the linear model is: 
(40) Fagg ely o8q-g) = 1-9)T, (2184.5) + ORCS¥Q) » 

where Jn is the joint distribution and K is the smearing distribution. 


We first need to define the cross-moments 


ay By 
(HL) Way se5,0,,a0) =f Sf dneley age le, 1 )i(s,_q )axdx'as, 5 
a, a, 5 
wd. 3 nel 
where the subscript $4.1 07 the third integration sign indicates integration 


over the 2(n-1)-Cartesian product of the interval [a,b] for the sequence 


The cross-moments defined by (41) generalize the moments we 


Sn-1° pr3pem 


of [3]. 


2 T am indebted to ee se W.Frankmann for useful comments on the subject 
of this. Appendix. 
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Assuming henceforth noncontingent reinforcement, it follows by 


simple extension of some results in [3] that 


(42) lin Pla, S Xiu S 8 a2 SH a) =.(1-9) lim Wa) ,2 43,8, n) 


n7vao no 


+ @[R(ag)-R(a, )][R(a,)-R(2,)] 


To obtain an explicit answer we must compute the limit on the right, which 
we now proceed to do. 
By virtue of the definition of s,_,, the right-hand side of (42) may 


be rewritten and we have: 


85 Bh py b 


(43) Wa; 525,83, 9m) Sel ed Jnf*1¥ 7 >% p17 Sn-o) 
@ 8, a a Ss, 4 


Jy" 9p ia Xp 78n-2)I Vea %aar?Sn-2 ONG "AY, 1%, 7 48,5 "i 


Applying the axiom (40) to the right-hand side of (43) and simplifying, we 


obtain: 
3 85° 2h 
QM) Wazapagan) = (1-0) ff fa, Cals, p)dgay(#'|e,29) 
a, 83 Sp : 
J(s, (5 )dxdx'ds, 5 
*2 Ah 
+20(1-0) f° fff dp y (xls, 948, p) (2 5%p f(T) ° 
ay bc) a 84-2 


By ‘a b 
1 2 t 
dxdx'dy jds, 5+ 6 f I a k(x, yy_,)K(*"¥,_1)£(y,_, axax'dy, 


3 
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Now the first-term.on the right of (44) is simply (1-6)°W(2, ,a5,83,8, n-L) j 
the second term is 20(1-0) [Ry (a5)-R,_1 (a, )ITR(ay )-R(a3) 1, and the 
integral of the third term is a direct generalization of 6 as defined by 
(31); moreover it is independent of n and we may define for ease of 


notation: 


2 4h bd 
~ (45) 784 98598298,) =f ff klxsy)k(x'sy)f(y)dxdx'ay 


ay oe a 


In these eins, (44) becomes: 
(46) Way ,ap,8,,a,.n) = (1-6)°W(a, 5a5,83,8 sn-1) + 
20(1-0)[R, y(a5)-R,_1(@, )I[R(ay)-R(a,) | 
+ Py (24 48548558) : 
It then easily follows from (46) that 


(47) te Mester tige) = W( aq 585 583 58),) 


2(1-6) [R(ap)-R( 2, )IER(a,,)-R(a,)] + 7(a, 98548558,) 
= =e Sees og. SS ee ° 


Combining (42) and.(47), we then have the following theorem. 
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Theorem. In the linear model 


(48) rae SXny S%» 83 SX, 5%) = @[R(a,)-R(8, ) 1ER(e,)-R( 23) ] 


2-8 
To.obtain the direct analogue of (20),(48) specializes to: 


lim P(a<X, 
n—- oo 


3 


+1 


2 
Se, a<X, <c) = oR(c)® + (1-6) nerd 


where aq is defined by (30). The analogue of (21) may be obtained in like 


fashion. 
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